Skip to content

Add concolic execution PoC to pyk#1

Open
Stevengre wants to merge 5 commits into
developfrom
feature/concolic-execution
Open

Add concolic execution PoC to pyk#1
Stevengre wants to merge 5 commits into
developfrom
feature/concolic-execution

Conversation

@Stevengre

@Stevengre Stevengre commented Jun 1, 2026

Copy link
Copy Markdown
Owner

Summary

A proof-of-concept concolic (concrete + symbolic) execution engine in a new pyk.concolic
package, built on pyk's existing CTermSymbolic / Kore RPC interface. No new backend and no
changes to the Java/Scala compiler.

The defining idea: a fast, deterministic concrete run produces an execution trace, which is
reused to drive a symbolic run down the same path — so branch selection is a rule-id lookup
instead of an SMT/simplify call per branch. The solver is consulted only to synthesize a new input
that diverges from the current path (DART/SAGE).

What's included

  • pyk/src/pyk/concolic/_engine.pyConcolicEngine:
    • concrete_trace() — ground run → ordered rule_id trace (from rewrite logs)
    • trace() — symbolic replay, branch selected by matching rule ids against the concrete trace
    • flipped_inputs() / explore() — negate a path-condition prefix, solve, and explore via a worklist
    • ConcolicTrace / PathConstraint data types
  • pyk/src/pyk/concolic/DESIGN.md — target architecture: the remainder-driven DART loop (ck/rk),
    the chosen Haskell guided-step backend feature, and the v0 → v3 milestones
  • pyk/src/pyk/concolic/README.md — what is implemented today, with architecture + flow diagrams
  • pyk/src/tests/unit/test_concolic.py — toolchain-free tests of the branch-matching / log-extraction logic
  • pyk/src/tests/integration/concolic/test_imp_concolic.py — end-to-end over the existing imp.k

Status

This is v0 per DESIGN.md: the symbolic replay selects branches by rule_id but still uses the
existing execute (which forks internally). The agreed v1 replaces this with a Haskell guided-step
RPC that applies a named rule deterministically and returns the applied condition + the remainder
(the sibling-branch condition that drives new-input generation) — eliminating forking and per-branch SMT.

Testing

  • make check (mypy / flake8 / isort / autoflake / pydocstyle / black) — pass
  • pytest src/tests/unit/test_concolic.py — 7 passed
  • pytest src/tests/integration/concolic/test_imp_concolic.py — 6 passed (legacy Haskell backend + Booster), against K v7.1.322

Scope / limitations

Deliberately small PoC: branch feasibility is witnessed by a real concrete run; loops are bounded by
max_step / max_branches / max_iterations; no CLI yet. v1 backend work and a coverage/directed
search strategy are the natural follow-ups (see DESIGN.md).

🤖 Generated with Claude Code

Stevengre added 5 commits June 1, 2026 14:33
Introduce a proof-of-concept concolic (concrete + symbolic) execution
engine in the new `pyk.concolic` package, built on top of the existing
`CTermSymbolic` interface. It adds no new backend; it drives the K
symbolic backend (Kore RPC / Haskell backend or Booster) along the single
path selected by a concrete input, records the path condition, and uses
`get_model` to synthesize new inputs that flip branches (DART/SAGE loop).

- `ConcolicEngine.trace` / `flipped_inputs` / `explore`
- `ConcolicTrace` / `PathConstraint` data types
- unit test with a toolchain-free fake symbolic interpreter
- end-to-end IMP integration test (legacy Haskell backend + Booster)
- README with architecture and exploration-flow diagrams
Restructure the PoC engine into two passes that reuse a concrete trace,
and document the target architecture.

- concrete_trace(): run with the input substituted in (deterministic, no
  fork) and harvest the ordered rule ids from the rewrite logs.
- trace(): replay symbolically, selecting each branch by matching rule
  ids against the concrete trace instead of an SMT/simplify call per
  candidate.
- rewrite the unit test around the new branch-matching / log-extraction
  helpers; refresh the README architecture + flow diagrams.
- add DESIGN.md: the remainder-driven DART loop (ck/rk), the chosen
  Haskell guided-step backend feature, and the v0 -> v3 milestones.
Replace post-hoc condition negation with backend-provided sibling
predicates (the remainder rk), realizing the remainder-driven DART loop
on the existing execute RPC (no backend change required).

- PathConstraint gains `siblings`: the rule_predicate of each non-chosen
  successor at a branch node (the rk alternatives).
- trace() records ck (chosen) + rk (siblings) at each branch; _match_branch
  now returns the sibling successors alongside the chosen one.
- diverging_inputs() replaces flipped_inputs(): for each branch node i and
  each sibling rk, solve prefix_{<i} & rk; feasible models are new inputs,
  infeasible siblings are pruned. Handles multi-way branches correctly.
- add a two-branch IMP integration test asserting exactly the 3 feasible
  leaves are found and the UNSAT combination is never generated.
- refresh README diagrams and DESIGN status accordingly.

Verified: 7 unit + 8 integration (legacy + Booster) pass; black/isort/
flake8/autoflake/mypy clean.
A skipif-guarded (CONCOLIC_BENCH=1) benchmark over a scaling family of
branchy IMP programs. Instruments the engine's execute / get_model calls
to measure where exploration time goes.

Measured (Booster, K v7.1.322): forward-replay `execute` is 97-98% of
wall time across k=1..6 and grows ~O(k^2) in branching calls, while
diverging-input `get_model` is ~1-2% and shrinks. This justifies the
Haskell guided-step optimization, which targets the forward-replay cost.
Thread the optional `assume-remainder-unsat` execute flag from the
concolic engine's forward replay down to the Kore RPC request.

- KoreClient.execute: new optional `assume_remainder_unsat` param
  (omitted from the request when None; ignored by backends that lack it).
- ConcolicEngine: new `assume_remainder_unsat` option (default True),
  passed on every replay `execute`. On a backend that supports the flag
  this skips the per-branch remainder-feasibility SMT identified as
  ~97% of replay time by the benchmark.

End-to-end verified against a locally-built kore-rpc-booster carrying the
flag: a rule with a satisfiable remainder aborts with the flag off and
proceeds (stuck/finished) with it on; concolic still finds all feasible
paths. Existing tests pass against the stock binary (flag ignored).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant